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Abstract 

Long paths and cycles in sparse random graphs and digraphs were studied intensively in the 
1980's. It was finally shown by Frieze in 1986 that the random graph Q(n,p) with p = c/n has 
a cycle on at all but at most (1 + e)ce~ c n vertices with high probability, where e = e(c) — ► as 
c — > oo. This estimate on the number of uncovered vertices is essentially tight due to vertices 
of degree 1. However, for the random digraph D(n,p) no tight result was known and the best 
estimate was a factor of c/2 away from the corresponding lower bound. In this work we close 
i this gap and show that the random digraph D(n,p) with p = c/n has a cycle containing all but 

\ (2 + e)e~ c n vertices w.h.p., where e = e(c) — > as c — > oo. This is essentially tight since w.h.p. 

such a random digraph contains (2e _c — o(l))n vertices with zero in-degree or out-degree. 
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1 Introduction 



In this paper we consider long cycles in random directed graphs, aiming to obtain estimates analogous 
to those derived for the undirected case. Formally, a random graph G(n,p) is a probability space of 
all graphs with vertex set [n], where each pair of vertices 1 < i < j < n is an edge of G ~ G{n,p) 
independently and with probability p. The model of random directed graphs D(n,p) is defined as the 
probability space of all directed graphs with vertex set [n] (without loops and without parallel edges, 
but possibly with anti-parallel edges), where each ordered pair with 1 < % ^ j < n, is a directed 

edge of D ~ T>(n,p) independently and with probability p. 

The existence of long paths and cycles in sparse random graphs was a subject of very intensive 
study in the eighties. Ajtai, Komlos and Szemeredi proved in [1] that with high probability in the 
random graph G(n, c/n) there is a path of length a(c)n, where a(c) > for c > 1 and linic^oo a(c) = 1; 
a similar but somewhat weaker result was proved independently by Fernandez de la Vega [7] . Then the 
attention has shifted to estimating the asymptotic behavior of the number of vertices uncovered by a 
longest path/cycle. Improving upon prior results of Bollobas [5] and Bollobas, Fenner and Frieze [6], 
Frieze has finally settled this problem: He showed in [9] that w.h.p. G ~ Q(n,c/n) contains a cycle 
covering all but at most (1 + e)ce~ c n vertices, where lim^oo e(c) = 0. This estimate is easily seen to 
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be asymptotically tight as Q(n,c/n) w.h.p. contains (1 + o(l))ce c n vertices of degree at most 1, all 
of which have to be missed by a cycle. 

For random directed graphs the situation appears to be more complicated. This is to be expected 
as the research experience of many years has shown that problems related to long paths and cycles in 
directed (random) graphs are usually much more challenging than their undirected counterparts. In 
the aforementioned paper [9] Frieze further established that w.h.p. T>(n,p) contains a cycle covering 
all but at most (1 + e)ce~ c n vertices, where lim c _>.oo s{c) = 0. This result was derived by appealing 
to a general theorem of McDiarmid [llj, coupling between events in Q(n,p) and in T>(n,p). Unlike 
in the undirected case, the above estimate on the number of vertices uncovered by a longest cycle is 
no longer asymptotically tight — the unavoidable loss in the directed case are vertices of in-degree 
or out-degree zero, whose number is easily seen to be asymptotic to 2e~ c n. 

In this paper we close the gap left by Frieze's work and obtain an asymptotically optimal result 
about longest cycles in sparse random digraphs. 

Theorem 1. Let D ~ D(n,p) be a random digraph with edge probability p = c/n for fixed c > 1. Then 
w.h.p. D contains a directed cycle that covers all but at most (2 + e)e~ c n vertices, where e = e(c) — > 
as c — > oo, and this is asymptotically tight as w.h.p. (2e~ c — o(l))n vertices of D have zero in-degree 
or out-degree. 

The proof of the theorem is given in the next section. In certain similarity to Frieze's argument 
in [9] we proceed by first filtering out vertices of zero in-degree or out-degree as well as some vertices 
close to them. The so obtained digraph typically retains all but a negligible fraction of the vertices 
of positive in-degrees and out-degrees; it is then upgraded to another random digraph, containing an 
almost spanning cycle, by sprinkling a few more random directed edges. 

Before we embark into the technicalities of the proof, we provide its outline, aiming to help the 
reader to parse the proof's details. 

The proof has two components/stages: filtering and factoring. The filtering stage aims to filter out 
vertices of in- or out-degree zero and possibly some other vertices and to produce an induced subgraph 
Dq of D ~ V(n,p), containing most of the vertices of positive degree; moreover, Dq is constructed in 
a way making it rather straightforward to show that it typically contains a factor of directed cycles. 
In order to produce Dq, we define the following iterative process. Let Y = {v : djj{v) = 0}, and let 
Z = {v : dn{v) < 3}, where do{v) = min{<ij(v), d~p](v)}. We start with X = 0, and then for k > 1 
we obtain by including vertices not in Ui<fc^*' ^ m S on a short path connecting two vertices 
x, y G Uj<fe Xi U Z. We repeat this process till it stabilizes and set X = \J k X^. Finally, the subgraph 
Dq is defined by Dq = D[V — (X U Y)]. Observe that for a given vertex v the probability that 
is at most some absolute constant (independent of c) is at most poly(c)e _c . Thus, for a given v the 
probability of having two vertices of small degree at a constant distance from v is poly(c)e _2c . Hence, 
we can expect to eventually have |X| < poly(c)e~ 2c n, and this is indeed what we prove. To facilitate 
the proof, we first get rid of short cycles (of length 0((l/c) log n)) in the underlying undirected graph 
G of D — they typically touch very few vertices. Analyzing the filtering process in a large girth graph 
is easier — for every v S X^ there should be an evidence for its association with X^ in the form of a 
tree T v rooted at v, of prescribed order, depth and with I leaves, where k < £ < 2 k . This is proven 
in Lemma 12.61 Using this lemma we can bound the size of the set X^ (or rather of a set X' k closely 
related to X^ and defined through an analogous filtering process) by the number of labeled rooted 
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trees meeting these requirements. This is done by first truncating unusually deep trees (Lemma 12. 7p 
and then bounding from above the expected number of trees of bounded depth. The final argument 
invokes martingales to show the concentration of the corresponding random variable around its mean 
and to bound its upper tail. All this is done in Theorem 12.21 

The factoring stage takes the induced subgraph Dq, the output of the filtering stage, as an input. 
By Theorem 12.21 we know that with high probability Dq contains all but a suitably small part of the 
vertices of D of positive degree. Moreover, one can prove (Lemma I2.3|) that all vertices in Dq have 
positive degree, and in addition every two vertices u,v with dD (u),dD (v) < 2 are at undirected 
distance at least 5. We then form an auxiliary bipartite graph Hq with parts L, R, corresponding to 
two copies of the vertices of Dq, where an edge (x,y) £ .Do becomes an edge xl2Ar £ E(Hq). It is 
quite easy to see that the existence of a perfect matching in Hq implies the existence of a spanning 
subgraph of Do composed of directed cycles. The probable existence of a perfect matching in Hq is 
shown in Lemma 12.41 using Hall's condition and standard density/expansion arguments for random 
(di) graphs. The next step is to trade the factor of directed cycles in Dq for one nearly spanning cycle 
using extra random edges, about 0{n/yJ\ogn) of them - a negligible quantity easily absorbed into the 
original random digraph. This is done using rather standard random graph arguments and extremal 
statements guaranteeing the existence of a long cycle in a highly connected digraph (see Lemma [23]). 
The factoring stage is treated in Theorem 12.11 

The next section contains the full details of the proof of the main result, and is followed by 
concluding remarks in Section [3l 



2 Proof of main result 
2.1 Filtering and factoring 

If D is a directed graph we use the notation du(v) to denote mm{d~^(v), dp(v)}. Similarly, we let 
N D (v) = N£(v) U Np(v) and in both cases may omit the subscript D when there is no danger of 
confusion. 

For an undirected graph G and a special subset of its vertices Z we define a filtering process which 
produces a sequence {X^} of disjoint subsets of the vertices as follows: 

Xq = %, 

3 x, y £ ( Uj<fe Xj) U Z such that x ^ y and 1 

w is on a path of length I < 4 between x, y, i.e. > for k > 1 , 

j< k d£{i = no, U\, . . . , ui = y} with U{Ui + i G E(G). J 

X = [jX k . (2.1) 

k 

The first ingredient in the proof is showing that w.h.p., once we filter the set X from the graph along 
with the vertices with zero in/out degree, the remaining vertices may be factored into large cycles 
and thereafter combined into a single long cycle while losing only a negligible number of vertices in 
the process. This is shown in the next theorem whose proof appears in Section [2. 2i 

Theorem 2.1. Let D ~ T>(n,p) where p = — for c > 1 fixed and let G be the undirected underlying 
graph of D. Let Y = {v : dr>(v) = 0}, Z = {v : djj(v) < 3}, and set X(G,Z) as in (|2.ip . Let 
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Dq be the induced subgraph of D on V(D) \ (X U 1") and let D' Q be its union with a random digraph 
T>(\Do\, (nylogn) IJ\Dq\ > n/5 then w.h.p. D' Q contains a directed cycle on \Dq\ — o(n) vertices. 

The following theorem, which we prove in Section 12.31 estimates the size of the filtered subset X 
w.r.t. vertices of low in/out degree in D. 

Theorem 2.2. Let D ~ T>{n,p) be a random digraph with edge probability p = c/n for fixed c > 1. 
Let Z = {v : do(v) < 3} and define X = X(G,Z) as in (|2.1|) where G is the undirected underlying 
graph of D. If c is sufficiently large then with high probability \X\ < (2c) 10 e~ 2c n. 

From the above two theorems we can immediately derive our main result. 

Proof of Theorem [JJ Let Y = {v : djj(v) = 0}. Note that w.h.p. \Y\ = (2e _c + o(l))n since 
d + (v), d~ (v) ~ Bin(n — l,c/n). For a sufficiently large c we obtain from Theorem 12.21 that w.h.p. 
\X U Y\ < (2e~ c + (2c) 10 e _2c + o(l))n. In particular, for large c and n we have that w.h.p. Dq, the 
induced subgraph on V(D) \(IU Y), has at least re/5 vertices (with room to spare) and we deduce 
from Theorem 12.11 that w.h.p. D(n,p') has a cycle missing at most \X U Y\ + o(n) vertices, where 
p' = (c/n) + (ra-y/log n) _1 = (c + o(l))/n. This establishes the required result for a choice of, say, 
e(c) = 2(2c) 10 e~ 2c , which makes up for |X| with an extra factor of 2 that readily absorbs the additive 
o(n)-term in \X U Y\ as well as the o(l)-term in p' . ■ 



2.2 Long cycles in the filtered graph 

To prove Theorem 12.11 we first need to establish several properties of the graph Dq stemming from 
the definition of X and the geometry of the random digraph D. 

Lemma 2.3. Let Dq be the induced subgraph on V(D)\(XL)Y) and let Gq be its undirected underlying 
graph. Then 

(i) Every u G Dq has dD (u) > 1. 

(ii) Every u,v G Dq with dD (u),dr) (v) < 2 have distG (u,v) > 5. 

Proof. To prove Part Q assume that some u G V(Dq) has do (u) = and assume without loss of 
generality that (n) = 0. 

First consider the case where d^(u) > 2. Observe that in this case there exist distinct x,y ^ Dq 
such that (u, x), (u, y) G E(D). Thus u is on a path of length 2 between x, y G XL)Y C XUZ, implying 
that u G X by definition and contradicting the fact that u G V(Dq). The case where d~j^(u) = 1 is 
treated similarly: Here there is some vertex v ^ V(Dq) such that (u, v) G E(D), v&XUYcXUZ, 
and furthermore u G Z by definition. Hence, u is on a path of length 1 between two distinct vertices 
u ^ v in X U Z and must thus also belong to X, in contradiction to the fact that u G V(Dq). 

To prove Part (jn]) let u,v be vertices satisfying dr) (u) < 2 and dD (v) < 2. If do{u) > 4 then it 
necessarily lost at least 2 neighboring (in/out) vertices in X U Y and hence must also belong to X. 
We thus conclude that dr>(u) < 3 and similarly that doiv) < 3. 

Let G be the underlying undirected graph of D. Recalling that u,v G Z by the definition of Z, 
there cannot be a path of length at most 4 between u, v in G, as such a path would imply that u, v 
must both belong to X. In particular, the induced subgraph Go C G also satisfies distG {u,v) > 5, 
completing the proof. ■ 
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Lemma 2.4. Let Hq be the undirected bipartite graph whose parts (L,R) correspond each to the 
vertices of Dq and where xlVr G E(Hq) iff (x,y) G E(Dq). Then w.h.p. Hq has a perfect matching. 

Proof. Recall that by Part (JI|) of Lemma 12.31 there are no isolated vertices in Hq (neither in L nor 
in R). Furthermore, by Part (Jn]) of that lemma we know that if x,y G L have degree 1 in Hq 
then N(x) f] N(y) = (otherwise Do would have two vertices with out-degree 1 and an undirected 
distance of at most 2 between them) and similarly for x,y G R with degree 1 in Hq. In other words, 
if we denote by Mq the set of edges incident to degree 1 vertices in Hq then Mq consists of vertex 
disjoint edges. Let Hi denote the bipartite graph obtained by deleting the vertices of Mq from Hq, 
i.e. Hi = Hq \ V(Mq). We now claim that Hi has minimum degree at least 2. To see this, suppose 
that djiiiu) < 1 and argue as follows. 

First, we must have dn {u) > 1 otherwise u G V(Mq) and hence does not belong to H%. If 
dn Q {u) = 2 then there must be some w G V(Mq) such that uw G E(Hq). In particular, either w has 
degree 1 in Hq or it is a neighbor of such a vertex, and either way we have that there exists some 
degree-1 vertex v G Hq whose distance from u is at most 2. The vertices corresponding to u and v in 
Dq thus satisfy drj (u) < 2 and dr> (v) < 1 while the undirected distance between them is at most 2, 
contradicting Part (JTTJ) of Lemma 12.31 

It thus remains to treat the case dH (u) > 3. In this case u has two neighbors wi,W2 G V(Mq), 
giving rise to V\,V2 G V(Mq) whose distance from u is at most 2 and with dH (vi) = dH (v2) = 1- 
These correspond to two vertices v\, V2 in Dq satisfying d(vi),d(v2) < 1 while the undirected distance 
between them is at most 4, again contradicting Part (jn]) of Lemma 12.31 

We have thus obtained that H% has a minimum degree of 2, and will now derive from this fact the 
existence of a perfect matching on Hi. It suffices to show that w.h.p. every set S C V(H\)C\L of size at 
most n/2 has |A^(5)| > as the same conclusion will carry by symmetry to all sets S C V(Hi) n R 
of size at most n/2, which would in turn imply Hall's condition for sets 5 C V(H\) n L of size larger 
than n/2. 

Let S be a subset of V(H\) n L of size s < n/5 let T = N(S) in Hi and assume that T has size 
t < s. Identifying these vertices with those of the original digraph D we have that e(S, T) > 2s by 
definition of Hi and the fact that it has minimum degree 2. 

Moreover, observe that every ueS has at most 2 neighbors in V(D) \ T. Indeed, since T includes 
all the neighbors of S corresponding to vertices of H\, any other neighbor v G N^(u) \ T must belong 
either to X U Y or to the vertices corresponding to V(Mq), and these satisfy: 

1. The vertex u cannot have two distinct neighbors in X U Y otherwise it would belong to X by 
definition and hence would be excluded from Dq. 

2. The vertex u cannot have two distinct neighbors in V(Mq) otherwise there would exist some 
x,y with dD (x) = do {y) = 1 and distG (x,y) < 4, contradicting Part ([n]) of Lemma [2T3T 

Combining these arguments we conclude that \Np(u) \T\ < 2, and note that for a given vertex u 
and subset T the probability of this event is at most 

P(Bin(n - t,p) < 2) < 3 (™ ~ *\ p 2 {l - p) n ^' 2 < 2c 2 e~l> np , 

where the last inequality used the fact t < s < n/5 and holds for any sufficiently large n as the 
(1 + 0(p))-factor was absorbed into the leading constant. Further note that the event that \Np(u) \ 
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T\ < 2 depends only on the edges from u to T and therefore for distinct vertices these events are 
independent. 

At this point, the following straightforward first moment argument shows that w.h.p. D cannot 
contain sets S,T of the above sizes where S has at least 215*1 edges going to T and every u S S has 
at most 2 edges going elsewhere. Indeed, the probability that such sets exist in D for any given such 
s,t is at most 



P 



2s 2c?e--M < 



en /en\*/ s /et\ 2 c 2 9 

t(t) (2)^ 



-Apn/5 



< 



;e 4 /2)c 4 e- 4c / 5 (t/n) 1 - 



< 



(e 4 /2)c 4 e 



-4c/5 



A(s,t), 



where we used the inequality (?) < (ea/b) b and the fact that i < s < n/5. For large enough c we 
have c 4 e _4c / 5 < 2e~ 5 and so 

A(a,t) < e~ s (s- l)/n, 
and summing over the possible values of s, t now gives that 



t<s<n/5 



E 

t<s<2 log n 



A(s,i) + £ A ( s '^ 



2 log n<s<n/5 
t<s 



< (2 log n) 



■ 2 log n 



n 



+ (n/5) 2 e- 101ogn = o(l) . 



It remains to treat sets S* of size n/5 < s < n/2. Verifying Hall's condition for such sets follows 
immediately form that the fact that w.h.p. every two sets S,T of size n/5 in D have an edge from 
S, T, as the following calculation shows: 

" V(l-p)W 5 ) 2 <[(5e) 2 e- c /f /5 = o(l), 
n/5/ L J 



where the last inequality holds for a sufficiently large c. ■ 

We are now in a position to prove Theorem 12.11 

Proof of Theorem 12.11 The edges of the matching provided w.h.p. by Lemma 12.41 correspond to a 
spanning subgraph of D comprised of disjoint directed cycles. Our first step is to delete from D all 
cycles of length less than | log c n. Note that the number of vertices participating in such cycles in 
the original digraph D is w.h.p. at most 



E 



n l p < (log c n) 



E ■ 

l< \ log c n 



•} = O^n 1 / 2 log n) = o(n) . 



The remaining disjoint directed cycles, denoted by C\, . . . , C m , thus contain |D | — o(n) vertices. Note 
also that the total number of cycles m satisfies: m < n/(h log c n) = 0(n/ log n). 

Let V = {Pi}\=\ be a maximum collection of vertex disjoint directed paths, each of length precisely 
[log°' 9 n], formed from the edges of {Ci} 1 /] =1 . Since the number of vertices uncovered by V in each 
Cj is at most [log ' 9 nj it follows that V covers all but at most mlog°' 9 n = 0(n/ log ' 1 n) vertices of 
Do- Furthermore, recalling that n/5 < \Dq\ < n, this implies that 



(| - o(l))n/ log ' 9 n < t < n/ log ' 9 n . 



(2.2) 



6 



For each path Pj G V define its prefix Aj and suffix Bj to be its first L = [log ' 8 n\ vertices and 
last L vertices, respectively. Consider now the digraph D\ = T>{\Dq\, {n^f\ogn)~ 1 ) . We will use the 
edges of D\ to weave most of the vertices covered by V into a long directed cycle, using the edges 
of the paths Pj as a backbone. Define an auxiliary digraph H where the vertex set [t] corresponds 
to the paths Pi,...,Pt and £ E{H) iff D\ contains an edge from Bi to Aj. Notice that if 

H contains a directed cycle C = (ii, then Dq U D\ contains a directed cycle of length at 

least /(log°' 9 n — 2 log ' 8 re), obtained as follows: Start at the last vertex of A\ x and proceed with the 
vertices/edges along P{ x \ use an edge from B^ to At 2 to jump to Pj 2 , then traverse the vertices/edges 
along Pi 2 till an edge from Bi 2 to Ai 3 and so on; finally use an edge from B^ to A^ and possibly 
some edges of to close the cycle. 

The digraph H is a random digraph on t = Q(n/ log ' 9 re) vertices with edge probability p that 
satisfies 1 — p = (1 — (n v / log~n) _1 ) i2 , implying that p = (1 + o(l)) log n n . We thus need to prove that 
such a random digraph contains w.h.p. an almost spanning cycle. This is an established fact, and 
here we derive it from the following lemma of [3j, whose short proof is included for completeness. 

Lemma 2.5 ([3]). Let D = (V,E) be a directed graph on t vertices in which for every ordered pair 
A, B of disjoint vertex subsets A, B C V of size \A\ = \B\ = k there is an edge from A to B. Then D 
contains a path of length at least t — 2k and a cycle of length at least t — 4k. 

Proof. Fix an arbitrary order a on the vertices of D and run the DFS (Depth First Search) on D, 
guided by a. The DFS maintains three sets of vertices: Let S be the set of vertices which we have 
completed exploring, T be the set of unvisited vertices, and U = V(G) - (SUT), where the vertices 
of U are kept in a stack (a last in, first out data structure). The DFS starts with S = U = and 
T = V(D), and at each stage moves a vertex from T to U (an unvisited vertex with an incoming 
edge from the top of the stack U) or from U to S until eventually all vertices are in S. As such, at 
some point in the course of the algorithm we must have \S\ = \T\; consider that point, and observe 
crucially that all the vertices in U form a directed path, and that there are no edges from S to T. 
We conclude that \S\ = \T\ < k — 1, and therefore \U\ > t — 2k + 2, so there is a directed path with 
t — 2k + 1 edges in D, as required. To get a directed cycle of the desired length, take a path as above 
and use a directed edge from its last k vertices to its first k vertices to close a cycle. ■ 

In order to apply the above lemma, take k = \n/\ogn\ while recalling that H ~ D(t,p) with t 
satisfying (|2.2p and p = (1 + o(l)) log w . As t < n/log a9 n, the probability that H has two disjoint 
vertex sets A, B of cardinality k each with no edges from A to B is at most 

G) (i ~ p)fe2 - [ (et/fc) e ~ pk \ k - [ (e + ( iog n )° 1 ■ e ~ (i_ ° (i)) iog01 ™] k = . 

thus w.h.p. H satisfies the conditions of Lemma 12.51 and in turn it contains w.h.p. a cycle of length 
at least t — 4k = (1 — o(l))i. As explained above, it follows that w.h.p. the digraph Dq U D\ contains 
a directed cycle covering all but 0(k log 0,9 n) + 0(n/ log ' 1 n) = o(n) vertices, as required. ■ 



2.3 Controlling the effect of the filtering process 

Proof of Theorem 12.21 An important element in the proof would be to analyze the set X with 
respect to a subgraph of D ~ V(n,p) with a reasonably large undirected girth. To this end we need 
the following lemma. 
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Lemma 2.6. Let G be an undirected graph with girth g and let Z be a subset of its vertices. Define 
X(G,Z) as in (|2,ip . For every 1 < k < g/8 and v £ there is a tree T v C G rooted at v whose 
leaves are in Z and interior vertices are in \Jj <k Xj. Moreover, T v has at most 5(\T V C\Z\ — 1) vertices, 
at most 4k levels (including the root) and its number of leaves £ satisfies k < i < 2 k . 

Proof. We proceed by induction on k. For the induction base recall that if v £ X\ then there are 
2 vertices x,y £ Z such that v is on a path of length at most 4 between x, y in G. Treat this path 
as a tree T v rooted at v, and notice that it has 2 leaves, at most 4 levels including the root (as 
dist(i>, x), dist(v, y) < 3) and the induced subgraph on it in G is a tree by the girth assumption on 
G. Furthermore, T v has at most 5 < 5(\T V fl Z\ — 1) vertices since \T V n Z\ > 2, thus satisfying the 
statement of the lemma. 

Next, let ft > 1 and let v £ X^. Let x,y £ Z U U <s . be the endpoints of a shortest path P 
containing v (by definition (|2.ip the path P has length at most 4). Suppose first that one of these 
vertices belongs to Z, i.e. without loss of generality x £ Z whereas y £ X^_\ (otherwise v would have 
belonged to some Xj with j < k). Define the tree T v as a path P y of length distc^, y) from the root 
v to the sub-tree T y , provided by the induction, together with another path P x of length distG(f,x) 
from v to x. On one hand, the paths P x ,P y are disjoint by definition, and furthermore, excluding 
their endpoints, their vertices do not belong to ZD Ui<fc Xj by the minimality of P and in particular 
do not belong to T y . On the other hand, if the path P x does intersects T y , which was guaranteed to 
have at most 4(k — 1) levels by induction, then together with P y they complete a cycle of length at 
most 4{k — 1) + 4 < 4k < g/2 in G contradicting its girth assumption. We conclude that T v is indeed 
a tree, with at most 4{k — 1) + 4 = 4k levels including the root. Finally, \T V n Z\ = \T y fl Z\ + 1, hence 
the induction hypothesis and the fact that T v adds at most 4 vertices to T y together imply that 

\T V \ < \T y \ + 4 < 5(\T y n Z\ - 1) + 4 < 5(\T V n Z\ - 1) . 

It remains to treat the case where x £ Xj for some j < k while y £ As before, if T x n T y ^ 

then together with the path P we obtain a cycle of length at most 8(k — 1) + 4 < 8k < g in G, 
contradicting the girth assumption. Otherwise, \T V n Z\ = \T X n Z\ + \T y n Z| and so our hypothesis 
on T x ,T y gives that 

\T V \ < \T X \ + \T y \ + 3 < 5(1^ n Z\ - 1) + 5(|T y n Z| - 1) + 3 < 5(|T„ D Z| - 1) . 

Noting that the number of leaves £(T„) was either ^(T^) + 1 or £(T X ) +£(T y ) immediately implies that 
k + 1 < ^(T^) < 2 k and completes the proof of the lemma. ■ 

Let G denote the undirected underlying graph of D ~ V(n,p), and define C C V(G) to be 
comprised of all vertices that belong to cycles of length at most 

R = (20/c) logra 

in G. Since each edge appears in G with probability at most 2p independently of other edges, the 
expected number of cycles of length r in G is at most n r (2p) r jr and thus 

E|C|<^(2cr<-^<n 1 / 5 , 

r<R 
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where we used the fact that c/log(2c) > 100 for sufficiently large c. In particular, \C\ < n 1 / 4 w.h.p. 

Define Z' = C U Z and let D' be the graph obtained by deleting all inner edges between vertices 
of C (i.e. all edges of the induced subgraph on C). Let G' denote the undirected underlying graph of 
D' and for all k let X' k denote the set Xk(G', Z') defined via (|2.ip . A key observation is that 

XCX'UC. (2.3) 
To see this, recall that Xq = and assume by induction that 

Uj<k Xj C ((Ui<* *j) U C) for some k > 1 . 

Let v G Xf.\C. Let P be a shortest path containing v in G with endpoints x, y G ( Uj<fe ^j)uZ. By the 
definition of and the minimality of P we know that P has 1 < I < 4 edges and none of its interior 
vertices belongs to ( Uj<fc Xj) UZ. Consider the two sub-paths from v to x, y (one of which is possibly 
empty) and let x' , y' be the first vertices on these respective paths that belong to ( Uj<& Xj) UCU Z. 
Since x, y clearly belong to this set, this defines a sub-path P' of length 1 </'</< 4 that contains 
v in G. Crucially, since v ^ C the path P' has no interior vertices in C and therefore all of its edges 
belong to G' . Finally, the induction hypothesis ensures that x', y' £ ( \J j<k Xj) U Z' and we conclude 
that v G {{Jj<kXj) U Z', completing the induction. 

Our next goal is to provide an upper bound on X' which is linear in n (with a suitably small 
coefficient), absorbing the negligible contribution to it from vertices in C. Observe that by definition 
the girth of G' is larger than R = (20/c) log n and set 

K = L(2/c)lognJ . (2.4) 

Invoking Lemma 12.61 w.r.t. G', for each k < K we can bound \X' k \ from above by |7jJ| where 

T' I T C G' ^eled rooted tree with t leaves, k < i < 2 k , all belonging to Z', 1 
I a total of t vertices for t < 5(£ — 1) and at most 4k levels. J 

In particular, we will be able to assert that X' K = by showing that is empty, as the next lemma 
establishes. 

Lemma 2.7. Set K as in (|2.4p . With high probability X' K = 0. 

Proof. In what follows let L(T) denote the set of leaves of a tree T and recall that if T G Tl then 
L(T) C Z' = C U Z by definition. 

Let 7^.* be the set of all trees in T k ' where at least I — 1 of the leaves belong to Z . We have (™) 
choices for the vertices of T G T k * on i vertices, and the well-known Cayley formula asserts that the 
number of labeled rooted trees on t vertices is t* . The probability that a given labeled tree on t 
vertices is in G (an upper bound on the probability it belongs to G' C G) is exactly (2p) t ~ 1 . Finally, 
if u G L(T) n Z then by definition <1q(u) < 3 and in particular ^g\t( u ) — 3. Crucially, the events 
{dc\T( u ) < 3} for u G -L(T) are mutually independent as well as independent of all the interior edges 
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of T (accounted for in the probability that T C G). Altogether, for all k < K, 

E I^I<E E (")^ 1 (2p)^(2P(Bin(n-t,p)<3)) £ - 1 . 

< Yj E e(2ec)*~ 1 (2P(Bin(n-t,p) < S))^ 1 ^, (2.6) 
£=k+l t<5(£-l) 

where in the last inequality we used the facts that < (en/tY an d t > £. If c is sufficiently large 
then n — t = (1 — o(l))n as t < 5 • 2 K = o(n) and in particular P(Bin(n — t,p) < 3) < ^c 3 e _c . 
Plugging this in (|2.6p gives that for sufficiently large n, 

nn\< e E (2«o* (<*rf - 1 n 

£=k+l t<5(e-l) 

< ((2e) 5 cV c )^ 1 n<ne-f cfc , (2.7) 

£=k+l 

where the last inequality holds for large enough c and n. Substituting k = K = L(2/c)lognJ now 
gives 

E|71| < ne~i cK = 0(n~ 1 ' 2 ) = o(l) , 

hence w.h.p. = 0. 

Now consider T G T' K \T^. Here there exist distinct Ui, Uj 6 L(T) nC. As T has at most AK levels 
and connects Ui,Uj in G' (where the inner edges between the vertices of C are absent) this implies 
the existence of a subgraph F C G with m vertices and at least m + 1 edges such that 

m < 2i? + + 2 < (60/c) log n 

(accounting for u; L and Uj, a path of length at most 8K between them and up to 2 cycles in C, with 
the last inequality holding for large enough c). When c is sufficiently large, the probability that such 
a graph F belongs to G is at most 

implying that T' K \ T£, and hence also T^, is w.h.p. empty. By (|2.5|) and the remark following that 
definition it now follows that X' K = w.h.p., as required. ■ 

It remains to estimate | Uk<K X' k \. To this end, let B(C,R/2) be the set of all vertices whose 
undirected distance from C in G is less than R/2 = (10/c)logn. Consider some v £ X' k for some 
k < K, let T v be the corresponding tree provided by Lemma [231 and suppose first that some leaf u in 
T v belongs to C (recall that every leaf of T v is in C U Z by (|2,5p ). Since by definition T v has at most 
Ak<AK< (8/c)logn levels it follows that T v C B(C,R/2) and in particular v £ B(C,R/2). Due to 
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this argument, if we let Tk denote the set of rooted trees in T k ' where all leaves belong to Z and let 
Yk denote the number of vertices serving as roots of such trees, i.e. 

qi labeled rooted tree with t leaves, k < £ < 2 k , all belonging to Z, 1 
a total of t vertices for t < 5(£ — 1) and at most 4k levels. f 

{v G V(G) : v is the root of T for some T G Tk} 
(notice that clearly Yk < \Tk\ for any k), then 

\U k<K X' k \ < \B(C,R/2)\ + J2 Y k- 

k<K 

To estimate the size of B(C,R/2) observe that each vertex v in this set corresponds to a graph on 
m < 3R/2 vertices and at least m edges. We can therefore repeat the calculation in (|2.8|) to get that 




where the last inequality is valid for large c. In particular, \B(C, R/2)\ < n 3 / 4 w.h.p. and it remains 
to estimate Y^k<K^k- 

Consider \7i\, counting rooted labeled trees in G with 2 leaves (i.e. paths with a distinguished 
vertex) and at most 5 vertices and where both leaves are in Z. Conditioned on the existence of a 
given labeled path P in G, the probability that its endpoints are in Z is less than the probability that 
each endpoint has an at most 3 in-neighbors or at most 3 out-neighbors in D \ P. Altogether, 

EYi<E|7I|< Yl tn t (2p) t ~ 1 (2P(Bin(n - < 3)) 2 

2<t<5 

< 4 • 5(2c) 4 (2P(Bin(n - 5,p) < 3)) 2 n < 20(2c) 4 (c 3 e~ c ) 2 n < 600c 10 e" 2c n, 

where the last inequality holds for sufficiently large n. 

Next examine \Tk\ for 2 < k < K, which counts trees with at most 4k levels, £ G {k + 1, . . . ,2 k } 
leaves and a total of t < 5(£ — 1) vertices, where all leaves are in Z. The calculation in (|2.6p . (|2.7p . 
with the single change that now all leaves (rather than £ — 1) belong to Z, yields 

2 fc 

E*fc<E|7fc|< Y ((2efc 8 e- c ) e n<e-'i c( - k+1 ^n, (2.9) 

i=k+l 

and combining the above inequalities we deduce that for large enough c and n we have 

Y EY k < 1000c 10 e- 2c n. (2.10) 

k<K 

To assess the deviation of the Y^'s from their mean, set Kq = [log log nj and observe that 

Y EY k < 2e~i cKo n < n/log 2 n, 

K <k<K 
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with the last inequality easily holding for c large. Applying Markov's inequality we deduce that 

P ( ^2 Xfc > ra/logn j < 1/logn = o(l) . (2.11) 

^ K <k<K ' 

It remains to estimate the Yfc's for k < Kq. To this end, define Yj, to be the number of roots of trees 
T G Tk such that every vertex in T has degree less than log 2 n in G: 

Yj. = #{v G V(G) : v is the root of T for some T G 7fc and < log 2 n for all u G T} . 

Recall that the underlying graph G is obtained from D by erasing its edge directions. Therefore, G 
itself is a random undirected graph Q(n,p') with edge probability p' = 1 — (1 — p) 2 = (1 + o(l))2p. As 
we will formally state later, G ~ G(n,p') has maximum degree less than log 2 n except with extremely 
low (super-polynomial) probability, and so = Y! w.h.p. We will show that Y^ is concentrated 
about its mean and then use it to derive concentration for Y}~. 

Let (Mi) be the edge-exposure Doob's martingale for D; that is, let ei, ... ,e/n\ be an arbitrary 
ordering of the edges of the complete graph on n vertices and set Mf = E [Yj. | Tt] where Tt is the 
o"-algebra corresponding to revealing the indicators {\^ e . & E{D)} '■ i <t}. We are interested in bounds 
on the increments of the martingale (Mi) in L°° and L 2 . 

Consider the effect of modifying one of the indicators l^ eG E}', clearly this can create or destroy a 
tree T G Tk only if that tree includes an endpoint of e as one of its vertices. Since Y^ counts roots 
of such trees where every vertex has degree less than log 2 n and by the definition of 7fc each such 
tree has at most 4k levels (including the root), it follows that modifying e can alter Y^ by at most 
(log 2 n) 4fc . In other words, Y^ is S-Lipschitz as a function of the edges of D, where 

B = (log 2 n) 4fc < (log 2 n) 4Ko < exp (8(log log nf) = n o(1) . 

It is a well-known (and easy to show) corollary that in this case |Mj+i — Mf\ < B for all t (see, e.g. [2] 
for the standard coupling argument deriving this for Doob's martingale of Lipschitz functions). 

Now assume that we have exposed l^ ei£E y, . . . , li Gte E\ and are about to reveal whether or not 
et+i G E. We wish to bound Var(M m | Tt). If we let 6 = E[Y£ \ Tt, e t +i <£ E] then the shifted 
variable Q = M m - 9 satisfies P(Q ^ | T t ) = F(e t+ i G E \ T t ) = p' < 2p whereas \Q\ < B by the 
assumption that P(|Mi+i — Mt\ < B) = 1 (in fact, even more precisely, one has \Q\ < B due to the 
S-Lipschitz property of Yl). Thus, 

Var (Mt+i | T t ) = Var(Q | T t ) < 2p(B) 2 = n - l+o{1) . 

and we conclude that for some L = n 1+ °^ we have ^ t Var (Mi | Tt-i) < L with probability 1. 

We are now in a position to apply the following large-deviation inequality which is a special case 
of a result of Freedman [51 Theorem 1.6] (see also [12\ Theorem 3.15]): 

Theorem 2.8. Let (So, Si, . . . , Sjv) be a martingale with respect to the filter (T). Assume that 
<Si+i — Si < B for all i and that ^2iLi~V^(Si \ T-i) < L with probability 1 for some L > 0. Then 
for any s > we have P Aj£=i{Si > S + s}) < exp [-\s 2 /(L + Bs)) . 
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Plugging in our estimate for |M^ + i — Mt\ and ^\ Var(Mi | J~t-i) while recalling that by definition 
of the Doob martingale Mq = EY k ' while Mrn\ = Y^ it now follows that 

F(\Y£-EY£\ > S )<2exp 

and in particular 

F(\Yl - EYl\ > n 3/4 ) < exp (n" 1 / 2 ^ 1 )) . (2.12) 

To complete the proof, recall that the probability that any vertex in G ~ Q(n,p') would have degree 
at least log 2 n is at most 

nP (Bin(n, 2c/ n) > log 2 nj < n exp (— c' log 2 n) < n~ 10 , 

where the last inequality holds for large enough n. In particular, Y! = Y k except with probability 
n~ 10 and since by definition < Y k ' < Y k < n we further have E[Y k ] = E[y fc '] + 0(n" 9 ). Combining 
these inequalities with (|2.12p now gives 

F(\Y k - EY k \ > 2n 3/4 ) < 2n" 10 , 

where the extra factors of 2 absorbed the 0(n~ 9 ) and exp(n _1 / 2+ °^ 1 ^) error terms. In particular, 
taking a union bound over the Kq < log log n values of k we deduce that w.h.p. 

J2 Y k~ £ EY k < 2n 3 / 4 loglogn. 

k<K k<K 

Finally, combining this inequality with (|2.10p and (|2.11|) we conclude that w.h.p. 

Y k < 1000c 10 e~ 2c n + 2n 3/4 log log n + nj log n = (1000 + o(l))c 10 e~ 2c n , 

k<K 

where the last inequality holds for large enough n. Together with the aforementioned bounds on X 
in terms of X' and in turn of X' in terms of ^ Y k we conclude that w.h.p. 

\X\ < \C\ + \B{C, R/2)\ + J2 Y k< nV4 + n3/4 + ( 100 ° + o(l))c 10 e- 2c n < (2c) 10 e - 2c n , 

k<K 

where the last inequality is valid for any sufficiently large n, as required. ■ 

3 Concluding remarks 

We have proved that a random directed graph D(n,c/n) contains with high probability a directed 
cycle including all but at most (2 + e)e~ c n vertices, where e = e(c) — > as c — > oo. In fact, our 
proof shows that the relative error term e(c) is exponentially small in c, namely e{c) < poly(c)e _c . 
The main term in the result is asymptotically optimal as such a random digraph typically contains 
(2e~ c — o(l))n vertices with zero in-degree or out-degree. 

It would be very interesting to derive accurate estimates for the length of a longest cycle in 
P(n, c/n) for small(er) values of the constant c, starting perhaps as low as the threshold for the 



4* 2 /(fi 1+o(1) +n o(1) *) 
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appearance of a linear length cycle in such a random digraph. See the related work [10] where Luczak 
studied the length of the longest cycle in the undirected random graph near its critical window, 
showing lower and upper bounds that are tight up to a factor of 1 + log(3/2) 1.41. 

Compared to the situation in undirected graphs, the toolkit available for the case of directed 
graphs is rather poor at present, thus making the progress in a variety of questions about directed 
random and pseudo-random graphs much harder to achieve. In particular, the absence of any form 
of a direct analogue of the famed Posa's rotation-extension technique, widely applied for undirected 
graphs, is felt throughout. It would be very useful to derive some directed version of it. 

In general, the field of random and pseudo-random directed graphs is largely an uncharted terri- 
tory, compared to the situation for the undirected case. Although this is certainly partly due to its 
relative difficulty, we believe enough knowledge and technology have been accumulated now to start 
exploring it in a systematic way. One recent such result is the paper [3], where global resilience type 
results with respect to long cycles have been derived for sparse random and pseudo-random directed 
graphs. It would be interesting to explore further resilience type questions in directed graphs. 
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